Supervised vs. unsupervised learning of spectro temporal speech features
نویسنده
چکیده
To overcome limitations of purely spectral speech features we previously introduced Hierarchical Spectro-Temporal (HIST) features. We could show that a combination of HIST and standard features does reduce recognition errors in clean and in noise. The HIST features consist of two hierarchical layers where the corresponding filter functions are learned in a data driven way. In this paper we investigate how different learning methodologies applied to the learning of the filters on the second layer influence the performance. We compare Non-negative Matrix Factorization (NMF), Non-negative Sparse Coding (NNSC), and Weight Coding (WC) on a noisy digit recognition task. NMF and NNSC are unsupervised learning algorithms whereas WC also includes class specific information in the learning process. Additionally we investigate how a mismatch between the database used for learning the features and the one employed for training and testing the recognition system influences the performance.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملSpectro-temporal Gabor features as a front end for automatic speech recognition
A novel type of feature extraction is introduced to be used as a front end for automatic speech recognition (ASR). Two-dimensional Gabor filter functions are applied to a spectro-temporal representation formed by columns of primary feature vectors. The filter shape is motivated by recent findings in neurophysiology and psychoacoustics which revealed sensitivity towards complex spectro-temporal ...
متن کاملTwo-stage multi-target joint learning for monaural speech separation
Recently, supervised speech separation has been extensively studied and shown considerable promise. Due to the temporal continuity of speech, speech auditory features and separation targets present prominent spectro-temporal structures and strong correlations over the time-frequency (T-F) domain, which can be exploited for speech separation. However, many supervised speech separation methods in...
متن کاملمقایسه روشهای مختلف یادگیری ماشین در خلاصهسازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت
In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...
متن کاملLanguage Acquisition Embedded into Tutor-Robot Interaction
Children acquire language to a large extend in the interaction with their caregivers. Inspired by this observation we develop computational models and artifacts for the acquisition of language in an interactive scenario. The artifact bootstraps its representations with little a priori knowledge and can be taught by a human tutor. In this framework we investigate different aspects of the speech ...
متن کامل